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1  Introduction 

The  Fast  Multipole  Method  (FMM)  is  a  recently  developed  scheme  for  the  evaluation  of  po¬ 
tential  fields  which  has  been  used  in  a  variety  of  contexts,  including  the  numerical  solution 
of  the  Laplace  equation,  fluid  dynamics,  particle  simulations,  and  numerical  complex  analysis 
[8, 4,3, 1,7].  In  order  to  evaluate  the  fields  induced  by  a  collection  of  N  charges  on  each  other, 
the  FMM  requires  an  amount  of  work  proportional  to  N  rather  than  N2.  The  constant  of 
proportionality  depends,  in  turn,  on  the  cost  of  applying  a  translation  operator  to  a  multipole 
or  Taylor  expansion.  In  existing  codes,  the  latter  is  0(p2)  in  two  dimensions,  and  0(p *)  in 
three,  where  p  is  the  degree  of  the  expansion  (see  [8,4,5]).  The  value  of  p  used  by  the  FMM  is 
roughly  equal  to  log2(j)  ,  where  e  is  the  desired  precision  of  the  calculation. 

This  paper  can  be  viewed  as  the  sequel  to  [4]  and  [5].  Here,  we  describe  a  procedure  per¬ 
mitting  the  translation  operators  for  the  Laplace  equation  to  be  applied  to  arbitrary  pth  degree 
expansions  (both  multipole  and  Taylor)  for  a  cost  proportional  to  p  •  logp  in  two  dimensions, 
and  p2  dogp  in  three.  The  incorporation  of  this  technique  into  the  general  FMM  scheme  should 
speed  up  the  execution  of  two-dimensional  codes  by  a  factor  of  two  or  three  compared  with 
the  results  reported  in  [3],  and  is  expected  to  make  large-scale  three-dimensional  simulations 
affordable. 


2  Theory  in  Two  Dimensions 

2.1  Translation  operators 

The  following  three  lemmas  describe  translation  operators  for  multipole  and  power  series  ex¬ 
pansions  for  the  Laplace  equation  in  R2,  and  provide  error  bounds  allowing  the  manipulation 
of  these  expansions  in  the  manner  required  by  the  Fast  Multipole  Method.  Detailed  proofs  of 
Lemmas  2.1  -  2.3  below  can  be  found  in  [4],  The  first,  Lemma  2.1,  supplies  a  mechanism  for 
shifting  the  center  of  a  multipole  expansion. 

Lemma  2.1  (Translation  of  a  Multipole  Expansion)  Suppose  that 


OO  ^ 

4>{z)  =  a0  log {z  -  *0)  +  (z  _  Zoy 


is  a  multipole  expansion  of  the  potential  due  to  a  set  of  m  charges  of  strengths  gi,gj,  ■  ■  ■  ,gmi 
all  of  which  are  located  inside  the  circle  D  of  radius  R  with  center  at  zq.  Then  for  z  outside 
the  circle  Dj  of  radius  ( R  +  |^o|)  center  at  the  origin, 

<t>(z)  =  a0log(*)  +  ]T  Jj,  (2) 

1=1 

where 
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with  (fc)  the  binomial  coefficients.  Furthermore ,  for  any  p  >  1, 

±t  ^  ^  bt  ^  (  A  \  Izol  +  R  P+1 

4>{z)  -  a0 log (z) ^  n<0|+fl| J  -T— 

untA  A  defined  by  the  formula 

A  =  ±\«\. 


Lemma  2.2  describes  the  conversion  of  a  multipole  expansion  into  a  local  (Taylor)  expansion 
inside  a  circular  region  of  analyticity. 

Lemma  2.2  (Conversion  of  a  Multipole  Expansion  into  a  Local  Expansion)  Suppose 
that  m  charges  of  strengths  q\,  q2, ...,  qm  ore  located  inside  the  circle  D\  with  radius  R  and  center 
at  zo,  and  that  \zo\  >  (c  +  1)/?  with  c  >  1.  Then  the  corresponding  multipole  expansion  (l) 
converges  inside  the  circle  D2  of  radius  R  centered  about  the  origin.  Inside  D2,  the  potential 
due  to  the  charges  is  described  by  a  power  series: 

oo 

^)  =  D‘,2‘'  (6) 

1=0 

where 

b0  =  a0log(-z0)  +  (7) 

*=l  *0 

and 

‘.  =  -n?  +  jESf'tt'i1)*-1'*'  f”1*1-  <8> 

*  zo  zo  k=izo\  / 

Furthermore,  for  any  p  >  max  ^2,  ,  an  error  bound  for  the  truncated  series  is  given  by 

i / _.\  Y-ft  j  -  ^(4e(p+c)(c+i)+c2)  /’iy+1 

{ > - UJ  ■  (9) 

where  A  is  defined  in  (5)  and  e  is  the  base  of  natural  logarithms. 

Lemma  2.3  provides  a  formula  for  shifting  the  center  of  a  local  expansion  within  a  region  of 
analyticity.  In  the  formula  (10)  below,  n  can  be  either  a  natural  number  or  oo.  In  both  cases, 
expression  (10)  is  an  exact  one,  and  no  error  bound  is  needed. 


Lemma  2.3  (Translation  of  a  Local  Expansion)  For  any  complex  zo,z  and  {a*},  k  = 
0,1, 2,.  ..,n, 

~  zo)k  =  '52bl- zl  (10) 

*=o  1=0 

where 

h  =  ^ak^^J(~zo)k~l  (11) 


2.2  Convolution  Form  of  Translation  Operators 

For  the  sake  of  simplicity,  in  the  remainder  of  this  paper  we  will  assume  that  the  zero-order  term 
in  all  multipole  expansions  vanishes.  This  will  not  affect  the  resulting  complexity  estimates 
and  allows  us  to  ignore  the  logarithmic  terms  in  all  the  preceding  expressions. 

Note  now  that  equations  (3),  (7)  -  (8),  and  (10)  define  three  infinite-dimensional  linear 
operators  connecting  the  sequences  {<Xj}  and  {&»}.  In  numerical  calculations,  both  multipole 
expansions  of  the  form  (l)  and  Taylor  expansions  of  the  form  (6),  are  truncated  after  a  finite 
number  of  terms,  and  the  resulting  sums  are  used  in  place  of  the  original  series.  The  validity 
of  this  approximation  is  established  by  the  error  bounds  (4)  and  (9).  We  therefore  restrict 
our  attention  to  the  finite-dimensional  versions  of  formulas  (3),  (7),  (8),  and  (10).  Truncating 
all  expansions  in  Lemmas  2.1  -  2.3  after  p  terms  (p  >  1)  leads  to  three  linear  operators 
UP,VP,WP  :  C  x  <DP  — »  <CP  defined  as  follows.  The  vector  u  =  Up(zo,x)  is  given  by 

tt0  =  x0  =  0  (12) 


and 


for  l  =  1 , . . . ,  p  -  1  . 


The  vector  v  =  Vp(z0,x )  is  given  by 


for,=1 . '-1' 


Finally,  the  vector  w  =  Wp(zo,x)  is  given  by 
p- 1 


for  l  =  1 , . . . ,  p  -  1  . 


(13) 


(14) 


(15) 


Expanding  out  the  binomial  coefficient,  the  first  translation  formula  (13)  may  be  written 
in  the  following  manner: 
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(17) 


Similarly,  formulas  (14)  and  (15)  become 


and 
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respectively.  I 

In  this  form,  it  is  clear  that  all  three  translation  operators  can  be  viewed  as  convolutions 
preceded  and  followed  by  diagonal  scalings.  They  can  therefore  be  computed  by  means  of 
the  Fast  Fourier  Transform  (FFT)  and  the  total  cost  of  applying  a  translation  operator  to 
a  p  -  term  expansion  can  be  reduced  from  0(pi)  to  0(p  •  logp).  From  the  point  of  view  of 
complexity  theory,  this  is  a  significant  improvement.  However,  the  FMM  requires  fewer  than  20 
terms  for  single  precision  calculations,  and  fewer  than  50  terms  for  double  precision,  so  that  a 
straightforward  application  of  the  convolution  theorem  does  not  reduce  the  actual  computation 
times  significantly.  Fortunately,  a  more  considered  use  of  the  Fourier  transform  changes  the 
situation  considerably,  and  gives  rise  to  a  significantly  faster  implementation.  In  order  to 
describe  this  approach,  we  will  require  a  more  formal  analysis  of  the  translation  operators. 


2.3  Diagonalization  of  Translation  Operators 

We  proceed  by  introducing  some  of  the  notation  needed  in  this  section.  Suppose  that  p  and 

q  are  natural  numbers  with  p  >  q.  For  a  vector  x  €  C^,  we  will  denote  by  Tp',(x)  the  vector 

y  €  <Dq  defined  by  the  formula 

y i  =  Xi  for  t  =  0, . . .  ,q  -  1  ,  (19) 

and  refer  to  the  mapping  Tp,,  as  truncation.  For  any  vector  y  €  C ,  we  will  denote  by  Eq’p(y) 
the  vector  x  €  C*1  defined  by  the  formula 

Xi  =  y,  for  i  =  0, . . . ,  q  -  1  , 

Xi  =0  for  i  =  q, . . .  ,p  —  1  ,  (20) 

and  refer  to  the  mapping  EP*  as  embedding.  Finally,  for  any  natural  number  p,  we  will  denote 
by  7P  the  mapping  Cp  — *  Cp  defined  by  the  Discrete  Fourier  Transform. 

In  the  Fast  Multipole  Method,  the  operators  Up,Vp,Wp  are  repeatedly  applied  with  vari¬ 
ous  translation  vectors  zq  to  various  expansions  x  representing  the  fields  generated  by  specific 
combinations  of  particles.  We  will  denote  by  Up,  Vp,  and  Wp  the  matrices  representing  the  op¬ 
erators  UP,VP ,  and  Wp,  respectively.  The  following  three  theorems  describe  the  decomposition 
of  these  matrices  in  the  desired  form. 


Theorem  2.1  For  any  integer  p  >  1  and  zq  €  <D, 

Up  =  Sp  o  T2p,p  o  Kp  o  Ep,2p  o  (Sp)-1 

=  Sp  o  T2p,p  o  i?2”)-1  o  D\p  o  72p  o  J57p>2p  o  (Sp)_1, 

where  Sp  is  a  diagonal  matrix  defined  by  the  formula 

(Sp)u  =  (i  -  1)!  , 

A'p  m  a  periodic  convolution  with  the  finite  sequence  Rp  given  by 


(Rp){  =  j  #»  f°ri  =  0,...,p~  1; 

l  0,  for  i  =  p, . . .  ,2p  -  1  , 


and  D\p  is  a  diagonal  2 p  x  2 p  matrix  with 

=  (/2p  o  £?■*(#)).••  (24) 

Proof.  The  first  equality  in  (21)  follows  from  formula  (16).  The  second  is  an  immediate 
consequence  of  the  (discrete)  convolution  theorem. 


Theorem  2.2  For  any  integer  p  >  1  and  Zo  €  <D, 

Vp  =  (Qp)-loT2p'poK$op2PoEp'2po(Sp)-1 

=  ( Qp )"x  O  r2p'p  O  (72")'1  O  d\p  O  O  P2p  O  E?2p  O  (S")-1 

=  (Qp)'1  o  T2p-p  o  (J2")'1  o  D\p  o  P2p  o  72p  o  Ep'2p  o  (5p)_1  , 

where  Sp  is  defined  by  equation  (22),  Qp  is  a  diagonal  matrix  defined  by 

Wp)«  =  *'i  •(-!)'' , 

Kp  is  a  periodic  convolution  with  the  finite  sequence  R%  given  by 

(*$),.=  (£$•  f0T  1  =  0,  p  —  1; 

l  0,  for  x-p . 2p  -  1  , 

D\p  is  a  diagonal  2 p  x  2 p  matrix  with 

and  P2p  is  a  permutation  operator  defined  by  the  formula 

=  x-,mod  „ 


IWWWW 
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Proof.  The  first  equality  in  (25)  follows  from  formula  (17).  The  permutation  matrix  P2p  simply 
changes  the  indexing  scheme  so  that  the  summation  is  in  convolution  form.  The  third  equality 
is  obtained  from  the  second  by  the  observation  that  P2p  commutes  with  the  discrete  Fourier 
transform. 


Theorem  2.3  For  any  integer  p  >  1  and  zq  €  C, 

Wp  =  (Qp)_1oT2ppo  KloEF'^oQP 

=  (Qp)_1  O  T2p*p  O  (72p)~1  o  D2/  o  72v  O  Ep'2p  O  Qp  ,  (30) 

where  Qp  is  given  by  (26),  K%  is  a  periodic  convolution  with  the  finite  sequence  Rp  defined  by 

'  1,  for  i  =  0; 

(*?).•  =  /0r‘  =  1>'-*P;  (31) 

for  i  =  p+  l,...,2p-  1  , 

and  DlP  is  a  diagonal  2 p  X  2p  matrix  with 

(Df)ti  =  (?2p^'Jpra). (32) 

Proof.  The  first  equality  in  (30)  follows  from  formula  (18).  The  transfer  function  R%  has  been 
ordered  in  the  manner  indicated  in  (31)  so  that  the  summation  is  in  convolution  form. 


3  Incorporation  into  FMM 

Suppose  now  that  we  would  like  to  apply  the  matrix  Up  to  a  vector  x  S  C p.  Using  Theorem 
2.1,  we  can  evaluate  the  product  Up{x)  in  the  following  five  steps: 

A)  Apply  the  diagonal  matrix  (Sp)-1  to  the  vector  x. 

B)  Embed  the  vector  (Sp)-1(x)  in  CZp  by  zero-padding,  and  compute  its  discrete  Fourier 
transform  with  the  FFT. 

C)  Multiply  the  resulting  vector  by  the  diagonal  matrix  D\p  given  by  (24). 

D)  Compute  the  inverse  discrete  Fourier  transform  of  the  resulting  vector  with  the  FFT. 

E)  Truncate  this  length  2 p  vector  to  one  of  length  p,  and  apply  to  it  the  diagonal  matrix  Sp 
to  obtain  the  desired  result  Up{x). 

Obviously,  the  cost  of  the  above  procedure  is  dominated  by  that  of  steps  B  and  D,  each  of 
which  is  proportional  to  p-logp.  Thus,  the  cost  of  applying  the  matrix  Up  to  an  arbitrary  vector 
has  been  reduced  from  0(p2)  to  0(p'logp).  A  similar  procedure  permits  the  matrices  Vp  and 
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Wp  to  be  applied  to  arbitrary  vectors  in  a  “fast”  manner,  with  the  help  of  Theorems  2.2  and 
2.3,  respectively.  Nevertheless,  as  mentioned  previously,  the  number  p  varies  roughly  between 
10  and  50  in  practical  situations,  so  that  this  observation  does  not  reduce  the  computation 
times  significantly. 

The  surprising  fact  is  that  the  bulk  of  the  Fast  Multipole  Method  can  be  carried  out  in 
Fourier  space,  where  the  translation  operators  are  diagonal.  More  specifically,  we  recall  that 
in  Step  1  of  the  FMM  as  described  in  [4],  we  form  the  multipole  expansions  for  all  boxes  at  the 
finest  level  of  refinement.  In  Step  2,  we  then  obtain  the  multipole  expansions  for  all  boxes  at 
all  higher  levels  by  merging  (  Theorem  2.1  ),  each  translation  being  carried  out  according  to 
formula  (16).  But  suppose  that  we  add  a  new  Step  la,  in  which  each  expansion  at  the  finest 
level  is  scaled  by  Sp  and  Fourier-transformed  (Steps  A  and  B  above).  The  diagonal  matrix  DlP, 
which  depends  only  on  the  shift  vector  zo,  can  then  be  applied  to  each  expansion.  The  result  is 
the  shifted  expansion,  albeit  in  Fourier  space  and  scaled  by  the  matrix  Sp.  The  merger  of  the 
four  child  box  expansions  in  Step  2  can  then  be  carried  out  by  adding  together  the  coefficients 
in  Fourier  space.  But  this  sum  is  already  in  the  form  needed  to  carry  out  the  subsequent  shift. 
The  way  in  which  the  convolution  is  written  in  formula  (16)  should  make  this  clear.  Thus  the 
entire  upward  pass  of  the  FMM  can  be  carried  out  in  Fourier  space  and  we  have  the  multipole 
expansion  for  each  box  at  each  level  in  the  transform  domain. 

Let  us  turn  then  to  the  downward  pass  in  which  multipole  expansions  are  converted  to  local 
expansions  and  local  expansions  are  transmitted  to  child  boxes.  By  examination  of  formula 
(17),  it  is  clear  that  these  scaled,  Fourier-transformed  multipole  expansions  are  already  in 
appropriate  form  for  the  conversion  to  be  a  diagonal  procedure.  In  the  decomposition  of  the 
operator  Vp  given  by  formula  (25),  we  have  in  essence  already  applied  the  operators 

72p  o  Ep'2p  o  (SP)"1  ,  (33) 

so  that  the  shift  corresponds  to  multiplication  by  the  permutation  operator  P2p  given  by  (29) 
and  the  matrix  D\p  given  by  (28).  The  new  expansion  is  the  desired  local  expansion,  again  in 
Fourier  space  and  scaled  by  Qp.  The  local  expansions  can  then  be  accumulated  in  this  dual 
form.  Finally,  in  the  second  loop  of  Step  3,  the  net  local  expansions  are  transmitted  to  the 
child  boxes  by  applying  the  operator  Wp .  But,  as  before,  we  have  already  implicitly  applied 
the  operators 

72p  o  Ep'2p  O  Qp  ,  (34) 

so  that  the  shift  corresponds  to  multiplication  by  the  matrix  DlP  given  by  formula  (32).  The 
resulting  expansion  is  the  local  expansion  for  the  child  box,  in  Fourier  space  and  scaled  by  Qp. 
Once  the  finest  level  is  reached,  it  remains  only  to  apply  the  operators 

( Qp)-1oT2p’po{72p)~ 1  (35) 

to  each  local  expansion,  in  order  to  obtain  the  coefficients  in  the  original  coordinate  space. 

In  the  original  formulation  of  the  method,  as  described  in  [4],  the  amount  of  work  required 
by  all  the  shifting  procedures  is  of  the  order 

Nk-27-p2  (36) 
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complex  operations,  where  Nk  is  the  number  of  boxes  in  the  refinement  structure.  As  described 
in  this  section,  it  costs  roughly 

Nk  •  27  •  p  +  Nk  •  10  •  p  •  log2  p  (37) 

complex  operations,  assuming  that  one  application  of  the  FFT  to  a  vector  of  length  p  costs 
5  •  p  •  log2  p  operations.  Even  for  relatively  small  p,  this  is  a  significant  improvement. 

Unfortunately,  the  strategy  indicated  above  will  not  work  well  when  the  desired  accuracy  is 
high  (p  is  large).  The  reason  for  this  is  that  the  convolution  operators  contain  factorial  terms 
which  exceed  the  precision  of  the  machine  fairly  quickly.  This  problem  can  be  overcome  by 
scaling,  leads  to  a  more  complicated  scheme,  and  is  addressed  in  more  detail  in  section  5. 

4  Theory  in  Three  Dimensions 

The  three-dimensional  version  of  the  FMM  is  based  on  spherical  harmonic  expansions.  These 
arise  from  consideration  of  the  Laplace  equation  in  spherical  coordinates 


i  a  f  2a$\  i  a  /  .  a$\  1 

r2  dr  V  dr  )  r2  sin  9  30  \  R  39  )  +  r2  sin 
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The  standard  solution  of  this  equation  by  separation  of  variables  results  in  an  expression  for 
the  field  as  a  series,  the  terms  of  which  are  known  as  spherical  harmonics. 


n-Om=-n  '  r  ' 


n=0m=— n  x  ' 

In  the  above  expansion,  the  terms  Y™(9,<f>)rn  are  usually  referred  to  as  spherical  harmonics 
of  degree  n,  the  terms  Y™{9,<l>)/rn+1  are  called  spherical  harmonics  of  degree  — n  -  1,  and  the 
coefficients  L™  and  M™  are  known  as  the  moments  of  the  expansion.  In  a  far  field  (multipole) 
expansion,  the  coefficients  Uff  are  set  to  zero.  In  a  local  (Taylor)  expansion,  the  coefficients 
M™  are  zero. 

The  following  three  theorems  are  the  generalizations  to  three  dimensions  of  Lemmas  2.1, 
2.2,  and  2.3.  Theorem  4.3  below  can  be  found  in  a  somewhat  different  form  in  the  literature 
[2,9].  Theorems  4.1  and  4.2  are  recent  results,  described  by  the  authors  in  [5,6]. 

Theorem  4.1  (Translation  of  a  Multipole  Expansion)  Suppose  that  l  charges  of  strengths 
7i>7a»,"i 91  are  located  inside  the  sphere  D  of  radius  a  with  center  at  Q  =  (p,ac,(3),  and  that 
for  points  P  =  (r,0,  <j>)  outside  D,  the  potential  due  to  these  charges  is  given  by  the  multipole 
expansion 


*(P)  =  E  £  ^  -CW) 


n=0  m=  -n 


where  P  —  Q  =  (r1 ,0' Then  for  any  point  P  =  (r,  9,4>)  outside  the  sphere  D\  of  radius 
(<*  +  P), 


vVv‘.AA 


(41) 


*(^)  =  £  £ 


j=Ok=-j 


where 


j  n  (~)k~m  •  Jk~m  .  Am  .  Ak~m  •  nn  •  Y~m(n  ft) 

Nk  —  ^2  £  u’~n  An  A*~n  _  In  ^a'p' 

n=Om=  —  n 


untA  A*  defined  by 

Am  _ 
— 

and  where 


(-1)" 


\f{n-  m)!  •  (n  +  m)!  ’ 


jm>  _  f  (_l)mm(|m'MmD>  ,/ m  .  m'  <  0; 
m  \  1,  otherwise. 

Furthermore,  for  any  p  >  1, 


j=Ok=-j 


p+i 


(42) 


(43) 


(44) 


(■■5) 


Theorem  4.2  (Conversion  of  a  Multipole  Expansion  into  a  Local  Expansion)  Suppose 
that  l  charges  of  strengths  q\,q2,'"  ,  <7i  are  located  inside  the  sphere  Dq  of  radius  a  with  center 
at  Q  =  (p,a,0),  and  that  p  >  (c  +  l)a  with  c  >  1  Then  the  corresponding  multipole  expansion 
(40)  converges  inside  the  sphere  Do  of  radius  a  centered  at  the  origin.  Inside  Do,  the  potential 
due  to  the  charges  q\,qi,' '  *  ,qi  is  described  by  a  local  expansion: 


00  j 


*(*>)  =  £  £  N3k-YJk(9,4>)-r}  . 

>=o *=-> 


where 


Nk  =Y'  V'  °n-j:m-K-Ak-Y,Tnk(^0) 


n=0  m  =  —  n 

with  A *  defined  by  equation  (43)  and  where 


jn'.m'  =  f  (-l)n'(-l)m,n{|m'l  |m|),  if  m  ■  m!  >  0; 

m  \(— 1)"',  otherwise. 


Furthermore,  for  any  p  >  1, 

p  ; 


•HP)  -EE  Y-vns.i)^-'  <  (I)’*1 


(46) 


(47) 


(48) 


(49) 


Theorem  4.3  (Translation  of  a  Local  Expansion) 
Let  Q  =  (p,a,)3)  be  the  origin  of  a  local  expansion 


*(/>)  =  E  E  OZ.Y?(9',<t>l)-r'n  , 

n=0  m=-n 

where  P  =  (r,9,<f>)  and  P  -  Q  =  ( r',9',4> ').  Then 


p  i 


*(P)  =  E  E  N*.Y,k(e,4>)-p  , 

j=Ok=-j 


where 


p  n  r\m  im  .  im-i  tk  ym-kf  a\  n-j 

N-  =  E  E  Un  Jn~j,m~k  An~j  Aj  Yn~j  p 


n=j  tn=  —  n 

with  A‘  defined  by  equation  (fS)  and  where 

(  (-l)"(-l)m,  if  m  •  m'  <  0; 

if  m  ■  m'  >  0  and  |m'|  <  \m\; 
l  (  — l)n,  otherwise. 


(50) 


(51) 


(52) 


(53) 


4.1  Convolution  Form  of  Translation  Operators 

As  in  the  two-dimensional  case,  equations  (42),  (47),  and  (52)  define  three  infinite-dimensional 
linear  operators  connecting  the  sequences  {O™}  and  {N™}.  Again,  in  practice,  both  multipole 
expansions  of  the  form  (40)  and  Taylor  expansions  of  the  form  (46),  are  truncated  after  a  finite 
number  of  terms,  and  the  resulting  sums  are  used  in  place  of  the  original  series.  The  error 
bounds  (45)  and  (49)  indicate  the  accuracy  of  these  approximations.  We  restrict  our  attention 
therefore  to  the  finite-dimensional  versions  of  formulas  (42),  (47),  and  (52).  Truncating  all 
expansions  in  Lemmas  4.1  -  4.3  after  the  pth  degree  (p  >  l)  leads  to  three  linear  operators 
UP,VP,WP  :  R3  x  <Cp*2p+1  — >  Cpx2p+1. 

We  proceed  now  to  write  the  translation  operators  in  convolution  form.  To  do  so,  we  will 
require  the  following  obvious  lemma  expressing  the  constants  J™  ,  ,  and  J™m  in  terms 

of  powers  of  «  =  \/—  1. 


Lemma  4.1  The  constant  in  equation  (44)  Is  given  by 


jm'  _  t-|m+m'|- |m|-|m'| 
fn 

(54) 

in  equation  (48)  is  given  by 

(55) 

n'm  ,n  equation  (5S)  is  given  by 

(57) 


The  first  truncated  formula  (42)  may  then  be  written  in  the  following  manner: 

*? ■  M  A  A  Ojl” ■ 


V1-*:  s 


n= 0  m=— r» 

Similarly,  formulas  (47)  and  (52)  become 


j|fc— m| 


and 


AT/  •  «i*l  p  " 

^  •  (~1  )J  ^onvtln  *'|m|  ‘  (-l)n+>  •  *  ^+"+1  ’ 

a} •  (-iy 


(58) 


(59) 


respectively. 

It  is  clear  that  all  three  translation  operators  can  be  viewed  as  two-dimensional  convolutions 
preceded  and  followed  by  diagonal  scalings.  They  can  be  computed  by  means  of  the  FFT,  and 
the  total  cost  of  applying  a  translation  operator  to  a  pth  degree  expansion  can  be  reduced 
from  0(p*)  to  0(p2  -logp).  By  contrast  with  the  two-dimensional  case,  this  already  provides  a 
noticeable  improvement  in  computation  time,  since  the  net  gain  grows  as  p2/logp  rather  than 
as  p/logp.  As  with  the  two-dimensional  case,  however,  the  entire  algorithm  can  be  formulated 
in  the  Fourier  transform  domain,  giving  rise  to  an  even  faster  numerical  scheme. 


4.2  Diagonalization  of  Translation  Operators 

As  in  the  two-dimensional  case,  we  proceed  by  introducing  some  notation.  Suppose  that  p  and 
q  are  natural  numbers  with  p  >  q.  For  a  vector  z  e  <Cpx2p+1,  we  will  denote  by  Tp  ,(z)  the 
vector  y  €  <C,x2,+1  defined  by  the  formula 

yj  =  x{  for  i  =  0, . . . , q  -  1  and  j  =  -  », . . . , »  (60) 

and  refer  to  the  mapping  Tp,q  as  truncation.  For  any  vector  y  e  C,x2<?+1,  we  will  denote  by 
Eq'p(y)  the  vector  z  €  Cpx2p+1  defined  by  the  formula 

*•  =y\  for  i  =  0,...,g  -  1  and  j  =  , 

z,-  =0  for  «  =  q, . . .  ,p  —  1  and  j  =  . . . , «  ,  (61) 

and  refer  to  the  mapping  Eq,p  as  embedding.  Finally,  for  any  natural  number  p,  we  will 
denote  by  7P  the  mapping  Cpx2p+1  — +  <CPx2p+1  defined  by  the  two-dimensional  discrete  Fourier 
transform.  We  denote  by  Up, VP,  and  Wp  the  matrices  representing  the  operators  UP,VP ,  and 
Wp ,  respectively.  The  matrix  decompositions  described  below  are  the  analogs  of  Theorems  2.1 
-  2.3. 
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Theorem  4.4  For  any  integer  p  >  1  and  {p,ct,f})  G  R3, 

UP  =  Sp  o  T2p,p  o  K[  o  Ep,2p  o  (Sp)-1 

=  SpoTip'po(?*p)-loDlPo?2poEp-2po{Sp)-\  (62) 

where  Sp  is  a  diagonal  operator  defined  by  the  formula 

{s'wr  =  ,  (es) 

Kp  is  a  periodic  two-dimensional  convolution  with  the  finite  sequence  Rp  6  <C2px4p+i  given  by 


■,  for  n  =  0,.. .  ,p-l  and  m  =  -n,. . .  ,n  ; 
otherwise  , 


and  D\p  is  a  diagonal  operator  with 

{ DlP(z)K  =  (72p  o  ^'2p(i?f)}-.  (65) 

Proof.  The  first  equality  in  (62)  follows  from  formula  (57).  The  second  is  an  immediate 
consequence  of  the  (discrete)  convolution  theorem. 


Theorem  4.5  For  any  integer  p  >  1  and  (p,a,fi)  G  R3, 

VP  =  {Qp)~l  oTip,p  °  K$o  P2p  o  E?'ip  o  (Sp)~l 

=  (Qp)_1  o  T2pp  o  {T2”)'1  o  D\p  o  72p  o  P2p  o  F?'2p  o  (Sp)_1 
=  (Qp)"1  o  T2p'p  o  ( J2p)~ 1  o  D\p  o  P2p  o  J2p  o  Ep'2p  o  ( Sp)~l  ,  (66) 

where  Sp  is  defined  by  equation  (68),  Qp  is  a  diagonal  operator  defined  by 

_m  .  | |m| 

-  (6?) 

Kp  is  a  periodic  two-dimensional  convolution  with  the  finite  sequence  R%  €  <C2px4p+1  given  by 


l  o. 


'  for  n  =  ■  • , p -i  <*nd  m  =  **»,■ 

otherwise  , 


D\p  is  a  diagonal  operator  with 

{DlP(x)K  =  {72p  o  Ep  2p(Rp2)}?  , 
and  P2p  is  a  permutation  operator  defined  by  the  formula 

<p’'W}r  =  . 
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Proof.  The  first  equality  in  (66)  follows  from  formula  (58).  The  permutation  matrix  P2p 
changes  the  indexing  scheme  so  that  the  summation  is  in  convolution  form.  The  third  equality 
is  obtained  from  the  second  by  the  observation  that  Pip  commutes  with  the  two-dimensional 
discrete  Fourier  transform. 


Theorem  4.6  For  any  integer  p  >  1  and  (p,a,fi)  6  R3, 

Wp  =  (Qp)-1  o  T2p,p  oK\o  Ep'2p  o  Qp 

=  (gP)"1  o  TJp,P  o  ( 72p)~x  o  D\p  o  ?2p  o  Ep’2p  o  QP  ,  (71) 

where  Qp  is  given  by  (67),  K%  is  a  periodic  two-dimensional  convolution  with  the  finite  sequence 

RP  e  C2px4p+l  defined  by 


(J*)“  =  K-:p2’-n+1 


for  n  =  0  and  m  =  0; 

for  n  =  p+l,...,2p-l, 
m  =  n-2p,. . .  ,2p-n; 

otherwise  , 


and  D\p  is  a  diagonal  operator  with 

=  {7”  o  .  (73) 

Proof.  The  first  equadity  in  (72)  follows  from  formula  (59).  The  transfer  function  R%  has  been 
ordered  in  the  indicated  manner  so  that  the  summation  is  in  convolution  form. 


Remark:  The  three-dimensional  version  of  the  FMM  has  been  modified  to  incorporate  the 

translation  operators  in  diagonal  form.  The  use  of  Fourier  space  and  the  description  of  this 
modification  are  essentially  identical  to  that  described  in  Section  3.  We  simply  note  here  that 
in  the  original  formulation  of  the  method,  as  described  in  [5,6],  the  amount  of  work  required 
by  all  the  shifting  procedures  is  of  the  order 

Nk  •  875  •  p*  (74) 

complex  operations,  where  N *  is  the  number  of  boxes  in  the  refinement  structure.  In  the  new 
implementation,  it  costs  roughly 

Nk  •  875  ■  p2  +  Nk  •  10  ■  p2  ■  log,  p  (75) 


complex  operations,  assuming  that  one  application  of  the  FFT  to  a  vector  of  length  p  costs 
5-p-logjp  operations.  The  improvement  is  clearly  more  dramatic  than  in  the  two  dimensional 
case. 

Remark:  The  constant  875  in  the  operation  count  above  refers  to  the  number  of  boxes  at  a 

fixed  refinement  level  with  which  each  box  must  interact.  Zhao,  in  [10],  makes  an  observation 
allowing  this  number  to  be  reduced  to  approximately  200.  The  idea  is  simply  that  if  each 
of  eight  child  boxes  are  in  the  interaction  list  of  a  given  box,  then  that  box  can  transmit  its 
multipole  expansion  once  to  the  parent  node  rather  than  eight  times  to  the  individual  children. 

5  Numerical  Stability 

We  turn  now  to  the  matter  of  numerical  stability.  As  indicated  in  Section  3,  when  the  desired 
precision  is  high  and  the  number  of  terms  in  the  expansions  is  large,  the  strategy  delineated 
there  will  not  work  well.  The  convolution  operators  in  both  two  and  three  dimensions  contain 
terms  which  grow  as  p!,  where  p  is  the  degree  of  the  expansion.  While  the  FFT  is  stable  in  the 
sense  that  it  is  a  unitary  operator,  these  terms  quickly  exceed  the  available  machine  precision. 

In  order  to  reformulate  the  FMM  in  a  way  which  avoids  this  problem,  we  must  reexamine 
the  convolution  form  of  the  translation  operators.  In  this  discussion,  we  will  concentrate  on 
the  two-dimensional  algorithm.  The  numerical  considerations  in  the  three-dimensional  case  are 
essentially  the  same. 

There  are  two  measures  which  may  be  taken  to  reduce  the  dynamic  range  of  the  problem. 
The  first  is  to  split  the  input  vector  into  two  (or  more)  blocks,  and  to  transform  each  block 
separately.  This  requires  that  the  translation  procedures  be  reconstructed  in  block  form.  We 
will  describe  a  second  approach  consisting  simply  of  polynomial  scaling,  which  allows  the 
translation  procedures  to  retain  their  form.  For  this  purpose,  we  rewrite  formula  (16)  in 
the  following  way: 

ul  8  _  z*8  z0  8 

(1-1)!  ~x  {k  -  1)!  (/  -  k)\ 

with  8  an  arbitrary  constant.  Similarly,  formulas  (17)  and  (18)  become 


Xk  •  s*  (l  +  k  -  1)! 
k-  1)!  ‘  (-z0)'+fc-  gH*  ’ 


s'  s*  *  (fc  -  /)!  (78) 

respectively.  It  remains  to  choose  8.  By  inspection  of  the  preceding  formulas,  it  is  clear  that 
we  wish  to  reduce  the  dynamic  range  of  sequences  whose  nth  entry  is  of  the  form 

n!  (r-s)n 

7 - r—  or  - ; —  ,  (79) 

(r  •  s)n  n!  ’  v  ' 


■»  ...v  .v.v.  .  .'  • 

*, * _  ■_ . . .  ,.  .  . 


or 


where  r  =  |zo|.  This  is  true,  not  only  of  the  transfer  functions,  but  of  the  expansion  sequences 
themselves.  (It  is  easy  to  show  that  the  multipole  expansion  coefficients  at  a  given  level  of 
refinement  grow  approximately  as  r"  and  the  local  expansion  coefficients  grow  approximately 
as  r-n,  where  r  is  the  average  distance  of  a  shift.) 

A  reasonable  strategy  for  choosing  s  is  to  require  that  r  •  s  =  p.  To  see  this,  let  us  con¬ 
sider  the  most  troublesome  sequence,  namely  the  transfer  function  in  equation  (77).  Unsealed, 
with  p  =  16,  the  sequence  values  range  from  0(1)  to  O(1035).  With  the  above  choice  of  s, 
the  corresponding  range  is  from  O(10~J)  to  O(10~6),  a  reduction  of  31  orders  of  magnitude. 
Unfortunately,  this  requires  a  different  scaling  constant  at  each  refinement  level,  and  the  fol¬ 
lowing  modification  of  the  FMM.  Let  us  suppose  that,  at  a  given  refinement  level,  the  scaled 
translation  operators  have  been  applied.  Before  moving  to  the  next  level,  either  up  or  down, 
each  expansion  sequence  must  be  rescaled.  Since  the  scaling  is  done  in  the  original  coordinate 
space,  this  requires  three  steps: 

1.  Apply  inverse  FFT  to  each  sequence. 

2.  Multiply  kth  term  in  sequence  by  (s0id/*ntw)k- 

3.  Apply  forward  FFT  to  each  sequence. 

The  expansions  will  then  be  in  appropriate  form  for  the  next  set  of  translation  procedures. 
The  amount  of  extra  work  involved  in  the  rescaling  steps  is  roughly 

~  ■  10- p-log2p  ,  (80) 

where  Nt  is  the  number  of  boxes  in  the  refinement  structure.  The  factor  of  four  arises  from  the 
fact  that  the  rescaling  of  expansions  is  necessary  at  coarser  levels,  but  not  at  the  finest  level. 
This  is  a  clearly  a  small  addition  to  the  workload  when  compared  with  the  order  estimate  for 
the  whole  algorithm  (37),  and  does  not  affect  the  execution  time  in  a  substantial  way. 

It  should  be  noted  that  scaling  is  not  a  universal  remedy.  The  factorial  and  polynomial 
terms  behave  quite  differently  when  p  is  large,  and  the  dynamic  range  does  grow.  Nevertheless, 
for  single  precision  calculations,  the  loss  of  accuracy  can  be  held  to  less  than  one  digit.  For 
double  precision  calculations,  approximately  three  digits  are  lost.  If  more  precision  is  needed,  a 
hybrid  procedure  can  be  constructed  using  both  scaling  and  the  block  decomposition  mentioned 
at  the  beginning  of  the  section. 

6  Numerical  Results 

Computer  programs  have  been  written  in  Fortran  77  using  both  the  original  formulations  of 
the  algorithm  [4,5]  and  the  formulations  presented  here.  We  compare  the  performance  of 
these  methods  for  three-dimensional  free  space  calculations.  Charged  particles  were  randomly 
assigned  positions  in  a  cube,  so  that  the  resulting  particle  density  was  roughly  uniform.  The 
potential  fields  were  computed  in  four  ways:  by  the  original  algorithm  in  single  precision,  by 
the  new  algorithm  in  single  precision,  and  directly  in  single  and  double  precision.  The  direct 
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calculation  of  the  field  in  double  precision  was  used  as  a  standard  for  comparing  the  relative 
accuracies  of  the  other  three  methods.  The  number  of  particles  varied  between  1,000  and 
64,000,  with  charge  strengths  randomly  assigned  between  zero  and  one. 

All  calculations  cited  below  have  been  carried  out  on  a  VAX-8600,  and  the  results  are 
summarized  in  Table  1.  The  first  column  of  each  table  contains  the  number  of  particles  N  in  the 
calculation.  The  second  column  contains  the  number  of  refinement  levels  nlev.  In  the  remaining 
columns,  the  upper  case  letters  T  and  E  are  used  to  denote  the  corresponding  computational 
time  and  error,  with  the  subscripts  new,  old  and  dir  referring  to  the  new  algorithm,  the  original 
algorithm,  and  the  direct  (single-precision)  calculation  respectively.  Columns  3  through  5  show 
the  times,  in  seconds,  required  to  compute  the  field  by  the  three  methods.  The  errors  Enevj, 
Eoid  and  Edir  for  the  new,  original  and  direct  methods,  respectively,  are  presented  in  the  next 
three  columns.  They  are  defined  by  the  formula 


E  = 


(  £&» 1  h  -  U 


*\ 


1/2 


where  /,  is  the  value  of  the  field  at  the  »-th  particle  position  obtained  by  direct  calculation  in 
double  precision  and  U  is  the  result  obtained  by  one  of  the  three  methods  being  studied. 

Remark:  For  the  tests  involving  8,000  or  more  particles,  it  was  not  considered  practical  to 

use  the  direct  method  to  calculate  the  fields  at  all  particle  positions,  since  this  would  require 
prohibitive  amounts  of  CPU  time  without  providing  much  useful  information.  We,  therefore, 
used  the  direct  method  to  evaluate  the  field  for  only  100  of  the  particles,  and  used  these 
results  to  evaluate  the  relative  accuracies.  The  corresponding  values  of  Tj,r  were  estimated 
by  extrapolation.  The  values  of  T0u  for  32,000  and  64,000  points  were  also  estimated  by 
extrapolation. 


On  the  basis  of  the  data  in  Table  1,  we  may  make  the  folowing  observations: 


1.  The  accuracies  of  the  results  obtained  by  the  feist  algorithms  are  in  agreement  with  the 
error  bounds  given  in  (45)  and  (49). 

2.  The  actual  CPU  time  requirements  of  the  fast  algorithms  appear  to  grow  somwhat  errat¬ 
ically  and  nonlinearly  with  N,  since  the  linear  bound  for  the  execution  time  is  well  above 
the  actual  execution  times  until  the  number  of  particles  is  quite  large. 

3.  By  the  time  the  number  of  particles  reaches  64,000  the  new  FMM  is  about  15  times  faster 
than  the  direct  method.  This  improvement  is  smaller  than  in  the  two-dimensional  case, 
where  the  speed-up  would  be  by  a  factor  of  200.  It  should  be  pointed  out,  however,  that 
the  number  of  points  required  for  the  resolution  of  a  given  problem  are  much  greater  in 
three  dimensions.  64,000  points,  for  example,  yield  the  resolution  of  a  40  x  40  x  40  grid. 
In  two  dimensions,  the  same  spatial  refinement  requires  only  1600  points,  at  which  point 
the  fast  algorithm  yields  a  speed-up  of  only  a  factor  of  8-10  (see  [3]). 
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N 

1  CM 

Told 

Tdir 

Enew 

Eoid 

Edir 

1000 

3 

23.4 

4.4  10"7 

4.4  10"7 

4.8  10"y 

2000 

3 

59.0 

■TVS 

5.7  10~7 

5.6  10"7 

6.6  10“7 

4000 

3 

179 

504 

291 

9.3  10~7 

9.3  10"7 

1.1  10“® 

8000 

4 

403 

3,939 

1134 

3.5  10"® 

3.4  10-® 

1.2  10"6 

16000 

4 

744 

4,280 

(4,490) 

3.8  10"® 

3.8  10“® 

3.1  10"® 

32000 

4 

2490 

(9,210) 

(17,960) 

4.0  10~® 

— 

4.1  10“® 

64000 

5 

4,810 

(44,230) 

(71,830) 

4.5  10~6 

— 

3.7  10“® 

Table  1:  CPU  times  and  error  estimates  for  N-body  calculations  in  three  space  dimensions. 
Particles  uniformly  distributed.  Degree  of  harmonic  expansions  p  =  8  . 

7  Conclusions 

A  new  implementation  of  the  Fast  Multipole  Method  has  been  developed  by  applying  transla¬ 
tion  operators  to  both  multipole  and  Taylor  expansions  in  a  system  of  coordinates  where  they 
are  diagonal.  We  estimate  that  for  single  precision  results,  the  speed-up  for  two-dimensional 
calculations  is  on  the  order  of  a  factor  of  two  or  three.  We  have  demonstrated  that  in  three- 
dimensional  calculations,  the  speed-up  obtained  varies  between  a  factor  of  three  and  ten. 

The  irregularity  of  the  speed-up,  as  well  as  the  jumps  in  the  CPU  time  requirements  in  Table 
1,  are  due  to  the  discrete  number  of  refinement  levels  available  to  the  non-adaptive  algorithm. 
Work  is  in  progress  on  an  adaptive  version  of  the  method,  which  is  the  three-dimensional 
analog  of  the  scheme  described  in  [3],  We  expect  a  significant  improvement  in  the  execution 
time,  even  for  fairly  homogeneous  distributions  of  particles.  The  relative  improvement  of  the 
new  implementation  should  then  remain  fairly  constant  at  a  factor  of  eight.  For  double  precision 
calculations,  the  relative  improvement  will  be  on  the  order  of  a  factor  of  thirty-two. 
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